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Abstract: Optimal dissemination schemes have previously been studied for 
peer-to-peer live streaming applications. Live streaming being a delay-sensitive 
application, fine tuning of dissemination parameters is crucial. In this paper, 
we investigate optimal sizing of chunks, the units of data exchange, and probe 
sets, the number peers a given node probes before transmitting chunks. Chunk 
size can have significant impact on diffusion rate (chunk miss ratio), diffusion 
delay, and overhead. The size of the probe set can also affect these metrics, 
primarily through the choices available for chunk dissemination. We perform 
extensive simulations on the so-called random-peer, latest-useful dissemination 
scheme. Our results show that size does matter, with the optimal size being not 
too small in both cases. 
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Diffusion epidemique par chunks : la taille 

compte 

Resume : Le probleme de la diffusion pair-a-pair en quasi-direct consiste a 
transmettre un contenu a un ensemble de pairs avec la meilleure qualite possible 
tout en minimisant le differe, c'est-a-dire le delai d'acheminement de la source 
aux pairs. Dans un grand nombre de solutions, le contenu est decoupe en 
quantites de taille fixee, les chunks. En supposant que le temps de transfert 
d'un chunk d'un pair a un autre est uniquement determine par la bande passante 
de I'emetteur, le delai optimal possible est typiquement en |(log2(n)), c etant 
la taille des chunks, s le debit du contenu diffuse et n le nombre de pairs. II 
semble alors naturel de choisir c aussi petit que possible afin de minimiser le 
delai. II arrive cependant un point ou les latences presentes dans le reseau 
influent necessairement sur la diffusion du contenu. 

Notre objectif est de mettre en evidence les phenomencs qui apparaissent 
lorsque la taille du chunk rend les effets de latence non neghgeables. En se basant 
sur un mecanisme de diffusion epidemique simple, nous mettons en evidence les 
effets suivants : 

• des chunks trop petits empechent I'algorithme de fonctionner efficacement, 
et generent un taux de pertes important ainsi qu'un gaspillage des res- 
sources reseau ; 

• a partir d'une certaine taille, les pertes cessent et la quantite de messages 
de contrfile se stabilise, le delai etant proportionnel a la taille du chunk ; 

• entre les deux se situe un intervalle de tallies adaptees a la diffusion. Le 
choix d'une taille precise depend du compromis a reahser entre pertes et 
delai. 

De plus, nous observons que I'introduction d'un certain parallelisme dans la 
diffusion permet de deplacer la zone utile, augmentant ainsi la performance de 
I'algorithme. 

Mots-cles : Pair-a-pair, diffusion en quasi-direct, delai, taille des chunks 
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1 Introduction 

Peer-to-peer data transfer has been a dominant source of network traffic for 
the past few years. Peer-to-peer mechanisms of transfer that rely on cUent 
uploads have also been used recently for live video streaming solutions such 
as PPLive |3|, CoolStreaming |11]. Natural questions that arise for live video 
streaming concern whether the delay and quality requirements can be met by 
distributed client-based dissemination. 

In most cases, a P2P live streaming algorithm splits the streams into chunks 
(also termed pieces). Chunks are considered as the atomic components of the 
stream, and a peer can only send chunks it has fully received. Much of the 
work in the literature has been devoted to the search for a chunk exchange 
policy that is feasible and optimal. We focus here on the unstructured epidemic 
approaches [3[T1[8], where the policy is described by a scheme that indicates 
which chunk a given peer should try to send, and to whom. 

A good scheme is indeed essential to the epidemic Hve streaming problem. 
For a given scheme however, an optimization at a detailed level is also impor- 
tant. This involves the fine tuning of dissemination parameters, such as chunk 
size, receiver buffer size, number of peers to probe, etc. The chunk size has a 
significant impact on performance, since smaller chunk sizes may be more ef- 
ficient but incur relatively higher overhead, and larger chunk sizes have lower 
overhead but may result in higher delay. The receiver buffer size (relative to 
chunk size) impacts the diversity in choice available to a peer for transmission. 
In the scheme with random peer choice, probing more than one peer for the 
decision of chunk exchange may help (power of choices), but it also increases 
overhead. These are some of the finer details of any dissemination scheme that 
must be closely examined. 

There has been some study on parameter sizing for peer-to-peer file sharing 
systems. In [6] it is shown that small chunk sizes are not always best for file 
transfer; [1] proposes upHnk allocation strategies designed to improve uplink 
utiHzation of BitTorrent-like systems. However, results obtained for file sharing 
systems are not directly appHcable to Hve streaming appHcations. First, a newly 
created chunk should be disseminated as fast as possible in live streaming, so 
there is a strong delay component, naturally limiting the chunk size. Secondly, 
missing chunks may be acceptable if a resilient codec is used, so optimal values 
are not always comparable to those in the file transfer case. Then, the buffer 
size, which is a parameter specific to streaming, can impact the performance 
(see for instance |12)). 

In this paper, we investigate dissemination parameters in peer-to-peer live 
video streaming through extensive simulations. Specifically, we focus on the 
rp/lu diffusion scheme, where a peer sends the latest (freshest) useful chunk 
to a randomly selected peer. We will also briefiy consider other schemes for 
comparison. We will show that indeed chunk size significantly impacts the 
performance. In fact, there is a range of chunk sizes that may be suitable, where 
the specific choice of the chunk size ultimately depends on the delay / chunk miss 
ratio trade-off. We will also show that a fine tuning of the number of peers to 
probe and the number of simultaneous chunks to send is important. 

The rest of the paper is organized as follows. In the next Section we out- 
line our simulation framework. Section [3] covers the impact of the chunk size, 
and highlights the suitable range of chunk sizes among various dissemination 
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schemes. Section [4] examines the value of the number of peers to probe for 
chunk dissemination. We finally conclude the paper in Section [5l 

2 Methodology 

Epidemic diffusion schemes and their behavior have been extensively studied in 
the literature. See for instance [T] for a detailed study and references therein. 
Here, we consider schemes where the sender first selects the destination peer 
and then sends a chunk. We focus on this particular class of schemes because 
they are fairly simple and thus allow us to focus on the impact of the parameters 
such as chunk size. Moreover, they are efficient in terms of diffusion rate and 
delay, and can potentially generate low overhead. 

In particular we consider three algorithms representative of this class: 

random peer / latest blind chunk (rp/lb). The destination peer is chosen 
uniformly at random among sender peers' neighbors and the most recent 
chunk in the buffer is selected (regardless of whether the receiver needs 
that chunk or not); 

random peer / latest useful chunk (rp/lu). The destination peer is cho- 
sen uniformly at random among sender peers' neighbors and the most 
recent chunk not own by the receiver peer is selected; unless otherwise 
specified, this is the scheme considered in this paper. 

bandwidth aware peer / latest useful chunk (ha/lu). This scheme is in- 
spired by [2J. A peer i is selected with a probability proportional to its 
upload bandwidth Ui and the most recent chunk not own by the receiver 
peer is selected. Note that for homogeneous upload bandwidths this is 
equivalent to rp/lu. 

In order to analyze these algorithms under the same framework and derive 
general results, we used an event-based simulator developed by the Telecom- 
munication Networks Group of Politecnico di TorincQ. The simulator has been 
modified to take network latencies, control overhead and parallel upload con- 
nections into account. 

In our simulator we assume that the overlay network is an Erdos-Renyi graph 
Q{n,p), where n is the size of the peer population and p is the probability that 
a link connecting two peers does exist. Every peer i has therefore a partial view 
of the overlay network, with an average number of neighbors pin — 1). 

We assume that every link connecting a pair of peers {i, j} is characterized by 
a constant round trip delay RTT^ and is lossless. We further assume that there 
are no queuing nor processing delays, so the transfer delay (the time for a chunk 
or control packet to travel from peer i to peer j) is equal to transmission delay+ 
^'^^'^ . The choice of such a network model allows us to obtain results that are 
not affected by the overlay network structure or by transport network congestion 
or losses. A peer is characterized by its upload bandwidth Ui. There is a single 
source S with upload capacity Ug and a limited overlay knowledge as well. 

Every peer periodically selects a subset m of its neighbors, according to one of 
the aforementioned algorithms (that is random or bandwidth-aware selection), 

Ihttp : //www. napa- wine .eu/cgi-bin/twiki/view/Public/P2PTVSim| 
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and probes them in order to discover their missing chunks, except for the case of 
the latest blind scheme. We refer to the set of neighbors probed as the probe set. 
Based on the responses possibly received, the peer then transmits corresponding 
chunks. 

A peer can upload a chunk to at maximum m' peers in parallel by fairly 
sharing its upload bandwidth. It may happen that a peer cannot serve m' 
recipients because it does not have enough useful chunks. In that case it uploads 
the chunks faster (since there are less than m' active connections), but it may 
stay idle for the subsequent period of time (because it needs to acquire new 
chunk maps from newly selected peers). An additional overhead is taken into 
account at every peer to reply to control messages coming from potential sender 
peers. 

Unless otherwise stated we consider a network of n = 1000 peers, all with the 
same upload bandwidth Ui = 1.03Mb/ s, an unlimited download bandwidth and 
about 50 neighbors (p = 0.05). We set the stream rate s = 0.9Mb/ s. Latencies 
between nodes are taken from the data set of the Meridian project [9]. A buffer 
of size up to 300 chunks is available at all peers, in order to avoid possible 
missing chunks due to buffer shortage (this implies a buffer size proportional to 
the chunk size). 

3 Chunk Size and performance 

When considering a streaming algorithm, a crucial performance metric is the 
diffusion rate/diffusion delay /overhead trade-off achieved by that algorithm, 
which can be summarized by a {chunk miss ratio, delay, overhead) triplet. 

Following [5], we define the chunk miss ratio as the asymptotic probability 
to miss a chunk (or equivalently the difference between the stream rate s and the 
actual goodput), while average diffusion delay is defined as the delay between 
the creation of a chunk and its reception by a peer, averaged over the successful 
chunk transmissions. Note that since Hnks are lossless, a peer misses a given 
chunk only if none of its neighbors has scheduled that chunk for it. The overhead 
is defined as the difference between the bandwidth used by peers (throughput) 
and the actual data received (goodput). In our framework, this overhead is due 
only to control messages exchanged between peers. 

As a first experiment, we analyze the performance triplet as a function of 
the chunk size. The results are shown in Figures [T] to O for the rp/lu scheme 
with m = ml varying from 1 to 5. 

3.1 Chunk miss ratio 

In Figure [11 we observe two cases: 

• For large chunks (in our experiment, c greater than a few hundred kilobits, 
the exact value depends on the number of simultaneous connections to), 
there are no missing chunks. 

• As the chunk size goes below a certain critical value, chunks start to miss, 
roughly proportional to the logarithm of the chunk size. 

This phenomenon can be explained as follows: the time between two con- 
secutive chunks is c/s, and is therefore proportional to the chunk size c. When 
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c is big enough (all other parameters being the same) , we can assume that more 
and more control messages per chunk can be exchanged between peers. This 
should achieve a proper diffusion, provided enough bandwidth is available, since 
a sender peer will have enough time to find a neighbor needing a given chunk. 
On the contrary, when c/s is too small, peers do not have enough time to ex- 
change control messages, resulting in missing chunks. Note that increasing m 
slightly improves the performance. 




Chunk Size [Mb] 



Figure 1: Chunk miss ratio as a function of the chunk size (to = to' varying 
from 1 to 5). 



3.2 Delay 

The average diffusion delay as a function of the chunk size is shown in Figure [21 
The main result is that the delay is proportional to the chunk size (hence the 
linear x-axis used; although difficult to observe on the figure, the proportional 
relationship was also verified for small values.). We also note that it grows with 

TO. 

This result is consistent with theoretical results obtained in [8] where RTT 
is neglected and the chunk transmission time is simply considered inversely 
proportional to the sender's bandwidth. Under that framework, the minimal 
diffusion delay is given by: 

TOcln(n) 
ln(l + m)s 



3.3 Overhead 

The performance with respect to overhead, i.e. the difference between the 
throughput and goodput, is shown in Figure [3] (only the curves for m = 1 
and TO = 5 are displayed for legibility). For very small chunks, we have a 
non-intuitive trend, where as c grows, the goodput increases and the through- 
put decreases (or equivalently, the overhead decreases faster than the goodput 
increases). This process slows down so that at some point the throughput in- 
creases again. For big enough chunks, the overhead becomes roughly constant 
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Figure 2: Average diffusion delay as a function of the chunk size 

(for a given m), while the goodput becomes equal to the stream rate (meaning 
no missing chunks). 

The goal of this paper is not to give a complete explanation of the observed 
results, but rather give some intuitions behind the overhead behavior. For very 
small chunks, chunk miss ratio is high, which, as mentioned earlier, come from 
the fact that not enough control messages can be sent. Asymptotically, we may 
imagine that only one control message per sent chunk is produced, resulting in 
an overhead/goodput ratio of ^, where c,. is the size of a control message. 

On the other hand, in the hmit as the chunk size is increased, we may expect 
that a peer can send a number of messages per sent chunk that is proportional 
to the chunk characteristic time c/s. This would result in an overhead ratio 
proportional to y , and thus independent of c (but not of other parameters like 
the median RTT or m). 




Chunk Size [Mb] 



Figure 3: Goodput and throughput as a function of the chunk size, the overhead 
being the difference. The stream rate s is also indicated. 
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3.4 Suitable range for c 

In light of the study above, there is a good order of magnitude for suitable chunk 
size in epidemic live streaming. For the parameters considered here, c should be 
greater than 0.06 Mb (which corresponds to about 15 chunks per second) and 
smaller than 0.3 Mb (3 chunks per second): 

• to send the stream at more than 15 chunks per second is good for the delay 
(which stays roughly proportional to c) , but results in both an increase in 
throughput and a decrease in goodput; 

• goodput and throughput are stationary for c greater than 0.3 Mb: using 
bigger chunks only means longer delay; 

• between these values, the choice of c results in a chunk miss ratio/delay 
trade-off: smaller delay with some missing chunks or greater delay with 
no missing chunks. Choosing a precise value for c depends then on factors 
that will not be discussed here, such as the codec used, the required QoS, 
etc. 

In our experiments the suitable range for chunk size begins when the chunk 
characteristic time (|) has the same order of magnitude than the median RTT, 
and ends an order of magnitude later. We scaled the RTT distribution used in 
order to observe the evolution of the range with the median RTT. The results, 
reported in FigurelH show that the range values are indeed roughly proportional 
to the median RTT. 

Note that the lower bound of the suitable range gives an indication on the 
minimal delay that can be achieved without too much missing chunks and over- 
head. In sectionlH we will see that enhanced diffusion techniques can help lower 
that bound. 

2.5 



2 



^1.5 

N 

'in 

g 1 

O 

0.5 



%0 100 150 200 250 300 
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Figure 4: Suitable range (for m' = m) 

We have performed experiments using various diffusion schemes, RTT and 
bandwidth distributions, values of probe set m, stream rate s and so on. All 
results are not report here for lack of space but, even if given metric values may 
differ, we observed the existence of a suitable range for c. 
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As an example, in the following we compare the suitable range of chunk size 
for two RTT values and the three following dissemination schemes: rp/lu, rp/lh 
and ha/lu. Since the scenarios with homogeneous bandwidths are identical under 
the rp/lu and ha/lu schemes, we use a heterogeneous bandwidth distribution 
derived from [10]. We set m = m' = 1, and we plot the throughput, goodput 
and average delay for these cases using two values of latency, RTT = 50, 100 ms 
(Figure HI). 

Note that the scheme rp/lh suffers high chunk miss ratios for all values of 
chunk sizes considered. Indeed it has been shown [l] that this scheme performs 
poorly with respect to rate, while being optimal with respect to delay. The 
scheme rp/lu has fewer missing chunks, but higher delay, while the performance 
of ha/lu Hes between the other schemes for both chunk miss ratio and delay. 

However, beyond the fact that the chunk miss ratio/delay /overhead trade-off 
is closely related to the scheme (more complete studies are available elsewhere [U 
[SI [2]), the striking observation is that all these schemes admit a similar suitable 
range for c, which seems to scale with the median RTT of the network. This 
supports our claim that the suitable range for c depends mainly on the median 
RTT and s (the inter-chunk delay | should have roughly the same order than 
the RTT), the actual scheme being secondary. 



200 




Chunk Size [IVIb] Chunk Size [Mb] 

(a) Average goodput and throughput (b) Average diffusion delay 



Figure 5: rp/lh, rp/lu and ha/lu comparison 



4 Size of Probe set 

In the results presented so far, we have assumed that the number of simultaneous 
exchange chunks, m' , is identical to the size of the probe set m. We now consider 
the impact of probing more peers than the number of simultaneous chunks sent. 
A larger probe set affords a sender peer a higher chance to find a recipient peer 
for whom it has useful chunks (power of choices principle). However, it also 
increases overhead, and possibly delay. 



Figure 6(a) plots the chunk miss ratio/delay trade-off for various m'/m pairs. 
The scheme is rp/lu, the bandwidth is homogeneous and the chunk size is set to 
c = 0.15 Mb (middle of the suitable range). The figure shows that using m' — m 
is not optimal, and having a larger probe set, m> m! significantly reduces both 
delay and missing chunks. The delay decreases from about 10 s for the m = m' 
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case, to less than 4 s for the 1/3, . . . , 6 cases (meaning m' = 1 and to = 3, . . . , 6). 
With regards to the chunk miss ratio, there are some (m'/m) pairs for which no 
missing chunks could be observed in our experiment: 1/3 — 6, 2/5 — 6, 3/5 — 6, 
4/6. This suggests that a consequence of using to' < to is a shift of the suitable 
range for c. 



5 'iil 

4s 2'5-E| 
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0.0001 0.01 
Chunk miss ratio [%] 



Chunk miss ratio [%] 



(a) c = 0.15 Mb (middle of the suitable (b) c = 0.035 Mb (below the suitable range for 
range for m = m') m = m') 



Figure 6: m'/m chunk miss ratio/delay trade-off for two values of c 

In order to verify this interpretation, we now set c = 0.035 Mb, which is 
clearly below the suitable range observed in § 13.41 for to = to'. The results are 
shown in Figure [6(b)| 

We observe that no pair {m'/m) can achieve diffusion without missing chunks 
for such a small c, however the trade-offs are still worthwhile with respect to 
the delay: using m'/m — 2/6, we get a delay of 1.7 s with a chunk miss ratio 
of about 0.02 %. This indicates that c = 0.035 Mb is definitively within the 
suitable range for m'/m — 2/6. 

Also note how the relative efficiency of the various m'/m values is impacted 
by the choice of c: for instance, 1/6, which is optimal for c = 0.15 Mb, performs 
rather poorly for c = 0.035 Mb. Although the results presented here refer to the 
rp/lu scheme, we performed experiments with other schemes and we observed 
similar trends, confirming that using a proper to' < to can significantly improve 
the delay. 

On the other hand, there is a price for going below the suitable range defined 
in S 13.41 for a given scheme, the overhead still depends on to and c. For rp/lu, 
it stays close to the overhead displayed in Figure [3] even for to' < to. So using 
small c with to' < to can reduce the delay, but it requires more throughput. 



5 Conclusion 

We have investigated the dissemination parameters of peer-to-peer epidemic live 
video streaming through extensive simulations. We have shown that the chunk 
size significantly impacts performance and that the chunk size should fall within 
a given range which is mostly determined by the median RTT of the network 
and the stream rate. 
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We have also shown that the size of the probe set afTects performance of 
diffusion schemes, and, in particular, a probe set larger than the actual num- 
ber of concurrent connections may improve miss ratio/delay performance by 
modifying the suitable chunk size ranges. 
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